Hierarchical assembly of the MLL1 core complex regulates H3K4 methylation and is dependent on temperature and component concentration

Enzymes of the mixed lineage leukemia (MLL) family of histone H3 lysine 4 (H3K4) methyltransferases are critical for cellular differentiation and development and are regulated by interaction with a conserved subcomplex consisting of WDR5, RbBP5, Ash2L, and DPY30. While pairwise interactions between complex subunits have been determined, the mechanisms regulating holocomplex assembly are unknown. In this investigation, we systematically characterized the biophysical properties of a reconstituted human MLL1 core complex and found that the MLL1–WDR5 heterodimer interacts with the RbBP5–Ash2L–DPY30 subcomplex in a hierarchical assembly pathway that is highly dependent on concentration and temperature. Surprisingly, we found that the disassembled state is favored at physiological temperature, where the enzyme rapidly becomes irreversibly inactivated, likely because of complex components becoming trapped in nonproductive conformations. Increased protein concentration partially overcomes this thermodynamic barrier for complex assembly, suggesting a potential regulatory mechanism for spatiotemporal control of H3K4 methylation. Together, these results are consistent with the hypothesis that regulated assembly of the MLL1 core complex underlies an important mechanism for establishing different H3K4 methylation states in mammalian genomes.

Enzymes of the mixed lineage leukemia (MLL) family of histone H3 lysine 4 (H3K4) methyltransferases are critical for cellular differentiation and development and are regulated by interaction with a conserved subcomplex consisting of WDR5, RbBP5, Ash2L, and DPY30. While pairwise interactions between complex subunits have been determined, the mechanisms regulating holocomplex assembly are unknown. In this investigation, we systematically characterized the biophysical properties of a reconstituted human MLL1 core complex and found that the MLL1-WDR5 heterodimer interacts with the RbBP5-Ash2L-DPY30 subcomplex in a hierarchical assembly pathway that is highly dependent on concentration and temperature. Surprisingly, we found that the disassembled state is favored at physiological temperature, where the enzyme rapidly becomes irreversibly inactivated, likely because of complex components becoming trapped in nonproductive conformations. Increased protein concentration partially overcomes this thermodynamic barrier for complex assembly, suggesting a potential regulatory mechanism for spatiotemporal control of H3K4 methylation. Together, these results are consistent with the hypothesis that regulated assembly of the MLL1 core complex underlies an important mechanism for establishing different H3K4 methylation states in mammalian genomes.
Mixed lineage leukemia protein-1 (MLL1, ALL1, HRX, and KMT2C) is a member of the SET1 family of H3K4 methyltransferases and is frequently altered in poor prognosis acute leukemias (29). MLL1 is a large protein with 3969 amino acids and assembles into a supercomplex with 30 subunits (30)(31)(32)(33). Subunits shared among all SET1 family members include WDR5, RbBP5, Ash2L, and two copies of DPY-30 (WRAD 2 ), which associate into a subcomplex that interacts with the C-terminal SuVar, Ez, Trx (SET) domain of MLL1 (34)(35)(36)(37)(38). In vitro studies have shown that the MLL1 SET domain predominantly catalyzes H3K4 monomethylation (36), whereas multiple methylation depends on interaction of MLL1 with WRAD 2 , forming what is known as the MLL1 core complex (also known as human COMPASS or MWRAD 2 ) (34,36,39). The requirement of full MWRAD 2 complex for optimal enzymatic activity suggests that H3K4 methylation may be regulated at the level of subunit assembly in the cell. Consistent with this hypothesis, genome-wide studies show that, while MLL1 localizes to thousands of genes in mammalian genomes, multiple methylation of H3K4 is mainly correlated with the subset of genes where MLL1 colocalizes with WRAD 2 subunits (40). In addition, disease-specific missense mutations have been shown to disrupt MLL family core complexes (41), suggesting that aberrations in complex assembly may be associated with human disease. More recently, several laboratories have shown that perturbation of MLL1 core complex assembly with protein-protein interaction inhibitors may have utility as a novel therapeutic approach for treating malignancies (42)(43)(44). Together, these results suggest that knowledge of the molecular mechanisms controlling MLL1 core complex assembly will be crucial for understanding of how different H3K4 methylation states are regulated in mammalian genomes. However, progress has been impeded by the lack of understanding of the biophysical and thermodynamic mechanisms involved.
Biochemical reconstitution studies using a minimal MLL1 SET domain construct show that the stoichiometry of the MLL1 core complex consists of one copy of the MLL1, WDR5, RbBP5, and Ash2L subunits, and two copies of the DPY-30 subunit (MWRAD 2 ) forming a complex with a mass of 205 kDa (36). Direct interactions have been observed between MLL1 and WDR5 (35,37,45), WDR5 and RbBP5 (46,47), RbBP5 and Ash2L (36), and Ash2L and DPY30 (36,48,49). While these pairwise interactions suggest a linear arrangement of subunits, several lines of evidence indicate a more intricate quaternary structure. For example, while MLL1 does not interact with RbBP5 or Ash2L in pairwise experiments (36), an investigation of Kabuki syndrome missense mutations suggests that the MLL1 SET domain directly interacts with the RbBP5-Ash2L heterodimer within the context of the holocomplex (41). The WDR5 subunit functions to stabilize this interaction by directly binding to the MLL1 WDR5 interaction motif (35,37,45) and RbBP5 (34,36). Binding experiments show that the weakest pairwise interaction occurs between the WDR5 and RbBP5 subunits (36), suggesting the complex may be hierarchically assembled. These interactions have been confirmed in recent cryo-EM and X-ray crystal structures of related SET1 family complexes (50)(51)(52)(53). Together, these results suggest that complex assembly is hierarchical in nature, with the requirement for the formation of distinct subcomplexes before assembly of the higher-order quaternary structure. The choreographic details of this assembly pathway are unknown.
In this investigation, to better understand the MLL1 core complex assembly pathway, we systematically characterized the hydrodynamic and kinetic properties of a reconstituted human MLL1 core complex under a variety of conditions. We found that MLL1 core complex assembly is highly concentration and temperature dependent. Consistent with the hypothesized hierarchical assembly pathway, we found that the holocomplex assembles through interactions between the MW and RAD 2 subcomplexes, and that MWRAD 2 formation is correlated with enzymatic activity. Surprisingly, we found that the disassembled state is favored at physiological temperatures and at concentrations typically used in steady-state enzymatic assays. This result suggests that the complex is predominantly disassembled in a cellular context and is then assembled in regions with high local concentrations of complex components. This is consistent with the hypothesis that regulated assembly of the MLL1 core complex underlies an important mechanism for establishing different H3K4 methylation states in mammalian genomes. It also suggests that regulated assembly of chromatin-modifying and/or chromatin-remodeling complexes may be an additional layer of control over these important cellular processes, further fine-tuning their actions within the nucleus.

MLL1 core complex assembly is concentration and temperature dependent
To better understand MLL1 core complex assembly, we purified human recombinant MWRAD 2 as described in the Experimental procedures section and characterized its oligomeric behavior by size-exclusion chromatography (SEC) and sedimentation velocity analytical ultracentrifugation (SV-AUC). SEC revealed that the purified complex eluted as a single symmetrical peak (Fig. 1A), and SDS-PAGE of the indicated fractions showed the presence of all five subunits with the expected stoichiometry (Fig. 1B). We note that the complex elutes later than expected based on its theoretical mass, which is likely because of the significant shape asymmetry of the particle. We then chose SV-AUC to characterize the concentration and temperature dependence of the complex in solution. SV-AUC is a first-principle technique that measures the time course of sedimentation of macromolecules in an artificial gravitational field in a way that maintains the equilibrium of reversible associations-allowing extraction of equilibrium and kinetic properties of interactions (54,55). Sedimentation boundaries formed as the particles sediment over time were fit using a finite element analysis of Lamm equation solutions (Fig. 1C) (56) to give the diffusiondeconvoluted sedimentation coefficient distribution c(s) (Fig. 1D). The c(s) plot of MWRAD 2 at 5 μM loading concentration at 5 C revealed a large peak accounting for almost 90% of the signal with an s 20,w (S) value of 7.2 and two minor peaks at 2.9 and 4.7 S that each account for 4 to 5% of the signal (noted with arrows in Fig. 1D). The major peak at 7.2 S corresponds to the fully assembled MLL1 core complex, which we previously showed assembles with a stoichiometry of 1:1:1:1:2 for the MWRAD 2 subunits, respectively (36). In addition, the S value of MWRAD 2 is independent of loading concentration ( Fig. 2A), indicating that the complex is stable at 5 C and has a relatively long lifetime compared with the timescale of sedimentation (57). Using the derived weightaveraged frictional coefficient (f/f 0 ) of 1.7, the calculated molecular mass from this S value was 209,561 Da, which is within error of the expected mass (205,402) based on the amino acid sequence of the holocomplex subunits at the indicated stoichiometry.
The minor peaks observed in the c(s) distribution in Figure 1D could represent trace contaminants in the sample or minor populations of dissociated subcomplexes and/or subunits. To distinguish these hypotheses, we compared c(s) distributions of MWRAD 2 at concentrations ranging from 0.25 to 5 μM at 5 C ( Fig. 2A) and 30 C (Fig. 2B). If the minor peaks represent noninteracting contaminants, then the relative amount of signal between the major and minor peaks will not vary as the loading concentration is decreased. In contrast, if the complex is dissociating into subcomplexes, then the relative amount of signal in the major and minor peaks will change as the loading concentration is varied. The results were consistent with the latter possibility. For example, while the effect at 5 C was modest, when the loading concentration of the complex was decreased from 5 to 0.25 μM, the amount of signal corresponding to the holocomplex decreased from 88% to 83% of the total signal, with a corresponding increase in both minor peak signals ( Fig. 2A). The effect was more obvious at 30 C, which showed that the signal corresponding to the holocomplex decreased from 65% to 25% of the total signal as the loading concentration was decreased (Fig. 2B). These results suggest that the minor peaks represent dissociated subcomplexes and/or subunits. Furthermore, because the S values of the minor peaks show varying degrees of concentration dependence, they likely represent subcomplex reaction boundaries as opposed to individual H3K4 methylation is regulated by MLL1 core complex assembly noninteracting subunits. These data suggest that the holocomplex assembles from predominantly two subcomplexes in a temperature-and concentration-dependent manner.
The MLL1 core complex assembles from MW and RAD 2 subcomplexes Previous experiments suggested that the holocomplex is assembled by pairwise interactions as follows: (36). Since the weakest pairwise interaction occurs between WDR5 and RbBP5 (36), we predicted that the complex assembles by first forming MW and RAD 2 subcomplexes, which then interact to form the holocomplex (Scheme 1). However, we reasoned that there are at least two additional reaction schemes that could give rise to the three boundaries observed in the holocomplex c(s) profiles (Schemes 2 and 3). To distinguish among these schemes, a Bayesian approach was used to analyze the SV-AUC data of the holocomplex collected at 25 C. The Bayesian approach is a variant of the standard maximum entropy regularization method utilized in the c(s) analysis in that, instead of assuming a uniform probability for the occurrence of species at every S value in a distribution, it utilizes prior information to assign different probabilities in different regions of S values (58). A key feature of the Bayesian implementation in SEDFIT (National Institutes of Health) is that it maintains the same degrees of freedom used in the standard c(s) analysis, and therefore, imperfections in the expected values will result in additional features in the c (p) (s) plots in order to maintain the quality of the fit (58). The Bayesian analysis therefore allows us to visually determine which reaction scheme gives a c (p) (s) profile that best fits the experimental data.
To obtain the expected S values for each of the predicted subcomplexes or subunits in each reaction scheme, we mixed stoichiometric amounts of their respective subunits and characterized their concentration dependence by SV-AUC at 25 C ( Fig. S1 and Table S1). We then used each of the S values collected at 0.25 μM as prior expectations in the Bayesian analysis of the holocomplex. As shown in Figure 3A, when the independently determined S values for MW, RAD 2 , and the MWRAD 2 species were used as prior expectations in the Bayesian analysis of the holocomplex at 0.25 μM (black dotted line), three peaks in the c (p) (s) plot were observed that were in excellent agreement with the expectations (cyan line). Indeed, good agreement was observed using the same S values as prior expectations for Bayesian fits of the experimental data collected at higher holocomplex concentrations (Fig. 3A). The only deviation observed was for the position and amplitude of the holocomplex peak, which at 25 C shifts from 6.8 to 7.2 S in a concentration-dependent manner (Fig. 3A). In contrast, when a similar analysis was conducted instead using the expected S values for the MWR and AD 2 subcomplexes predicted by Scheme 2, additional features in the c (p) (s) plot with an S value of 5.3 were observed at all loading concentrations that did not match the prior expectations (Fig. 3B, red arrow). Similarly, using the expected S values for M and WRAD 2 as predicted by Scheme 3, the c (p) (s) plot showed little evidence of a species matching the expected value of free MLL1 at 2.3 S and also showed additional features at 3.5 S that did not match expectations (Fig. 3C, red arrow). To test whether the holocomplex assembles in a concerted fashion from individual subunits, we also performed a similar Bayesian analysis using the predetermined S values for M, W, R, AD 2 , and MWRAD 2 as prior expectations (AD 2 is treated as a discrete species since it does not appreciably dissociate under the range of concentrations that can be detected by the absorbance optical system used in these experiments (36)). The c (p) (s) plot showed additional features with an S value of 5.2 that did not match expectations (Fig. 3D, red arrow). Together, these results are consistent with the hypothesis that MLL1 core complex is hierarchically assembled by association of MW and RAD 2 subcomplexes.
The disassembled state of the MLL1 core complex is favored at physiological temperature To further explore the thermodynamics of MLL1 core complex assembly, we compared the temperature dependence of MWRAD 2 formation at several different loading concentrations using SV-AUC (Fig. 4). Each c(s) profile was integrated, and the relative amount of signal corresponding to the S value of the holocomplex was plotted as a function of temperature and total loading concentration ( Fig. 4F). At the highest loading concentration (5 μM), little variation in the amount of holocomplex was observed between 5 and 25 C (Fig. 4, A and F), with a peak that accounted for 81 to 92% of the total signal (Table S2). In contrast, at temperatures greater than 25 C, the amount of holocomplex decreased precipitously until only 3% of the signal could be observed at 37 C (Fig. 4, A and F and Table S2). The effect of temperature on MLL1 core complex stability became increasingly more severe as the loading concentration was decreased. For example, at the lowest loading concentration (0.25 μM), only the 5 and 10 C runs showed 80% holocomplex (Fig. 4, E and F and Table S2); whereas at higher temperatures, the signal corresponding to the holocomplex decreased from 63% at 15 C to 2% of the total signal at 37 C ( Fig. 4F and Table S2). At 37 C, most of the signal is instead dominated by the two subcomplex peaks with S values of 3 and 4.7 (Fig. 4G). These data are consistent with the hypothesis that the holo-MLL1 core complex assembles from interaction of MW and RAD 2 , the equilibrium of which is highly concentration and temperature dependent.
Surprisingly, at all the concentrations tested, very little holocomplex with an S value of 7.2 was observed at physiological temperature (37 C) (Fig. 4G). This suggests that the disassembled state of the MLL1 core complex may predominate in cells and that other factors are required to stabilize the assembled state. In support of this hypothesis, closer examination of the c(s) profiles of the complex at 37 C revealed evidence that increased protein concentration promotes complex formation. For example, while similar amounts of signal are observed in the two subcomplex peaks at the 0.25 μM loading concentration (cyan line, Fig. 4G), the relative amount of signal in the two peaks changes with progressively higher concentrations. The intensity of the larger peak increased at the expense of the smaller peak and began to show evidence of concentration-dependent shifting to higher S values. This hydrodynamic behavior is consistent with a reaction boundary composed of free and bound reactants that interconvert under a rapid kinetic regime that cannot be resolved within the signal to noise of the experiment (57). These results suggest that, unlike the long lifetime of the assembled complex observed at 5 C, the kinetics of the interaction have changed at 37 C such that the complex now has a short lifetime compared with the timescale of sedimentation.
We next analyzed the concentration series at each temperature to derive binding isotherms. We integrated each c(s) profile (between 0.5 and 9.5 S) to determine the weightaverage sedimentation coefficients (s w ) (59), which were then plotted against MWRAD 2 concentration and fit to derive the apparent dissociation constant (K d app ) for each isotherm (Fig. 5A). Given that the majority of signal in each c(s) profile could be assigned to three peaks, we applied the A + B ⇆ AB heteroassociation model in the program SEDPHAT (National Institutes of Health) (60) and obtained reasonable fits ( Table 1). The derived K d app values ranged from 7 nM at 5 C to 6200 nM at 37 C (Table 1). A van't Hoff analysis showed that complex formation is exothermic, which is offset by the negative entropy change as the complex subunits become more ordered (Fig. 5, B and C). However, the van't Hoff plot reveals a nonlinear relationship between K eq and temperature, indicating a change in the heat capacity of the system at higher temperatures (Fig. 5B). These data suggest at least two mechanisms for complex assembly, which differ by temperature. At low temperatures (≤25 C), the equilibrium favors complex formation, with a relatively long lifetime that is stable on the timescale of sedimentation. Under this mechanism, the interaction is dominated by enthalpic contributions to the free energy ( Fig. 5C). At high temperatures (>25 C), the equilibrium is shifted into the rapid kinetic regime with a short complex lifetime where dissociation is more likely. While there is little difference in the Gibbs free energy between mechanisms, there is a difference in the contributions between the enthalpic and entropic terms. At higher temperatures, the entropic penalty to complex formation was increased sevenfold compared with that of the lower temperature mechanism, whereas the difference in the enthalpic contribution was only increased by 3.8-fold (Fig. 5C). These results suggest that, at physiological temperature, one or more of the subunits samples alternate conformational states, some of which are not competent for complex assembly. However, given the observation that some holocomplex forms in a concentrationdependent manner, increased local concentration of subunits may be a mechanism that cells use to overcome the increased entropic cost of complex formation at 37 C.
Enzymatic activity of the MLL1 core complex is directly related to complex assembly To determine the impact of concentration and temperature on the enzymatic activity of the MLL1 core complex, we incubated MWRAD 2 (0.25-5 μM) with a fixed concentration of histone H3 peptide (10 μM) and saturating amounts of AdoMet (250 μM) at various temperatures. We then measured H3K4 methylation is regulated by MLL1 core complex assembly methylation using a label-free quantitative MALDI-TOF mass spectrometry assay (36). MALDI spectra were integrated, and the relative amount of each peptide species was plotted as a function of time. Data were fit using a numerical integration of rate equations approach implemented in KinTek (KinTek Corporation) Explorer software (61), which allowed us to test the ability of different reaction models to fit the data. Using the simplest irreversible consecutive reactions model (Fig. 6, model 1), while acceptable fits were obtained for reaction progress curves collected at the highest concentration (5 μM) between temperatures 5 to 30 C (5 C is shown in Fig. 6A), the rest of the fits were poor (an example is shown in Fig. 6B). Since we previously showed that the complex uses a nonprocessive mechanism for multiple lysine methylation (36), we revised the model to incorporate binding of peptide substrate to the enzyme-AdoMet complex (E 1 ) and release of the H3K4me1 product after the first methylation event, followed by binding of the H3K4me1 substrate to a distinct site on the enzyme (E 2 ) for the dimethylation reaction. The latter step is predicated on our previous observation that the MLL1 core complex has a cryptic second active site independent of the SET domain that is required for the H3K4 dimethylation reaction (36,62,63). Since the binding and release rates of substrates and products are currently unknown, these values were fixed to be non-rate limiting. This model allowed us to incorporate an additional term to test the impact of reversible complex disassembly, which results in negligible activity of both enzymes under these assay conditions (Fig. 6, model 2) (36,37). Initial values for the ratio (k off /k on ) for complex assembly were set to be equal to the K d app derived from each SV-AUC isotherm experiment.
The resulting simulations showed that adding a reversible complex disassembly step to the reaction scheme only modestly improved fits to the lower temperature data (Fig. 6C) but did not improve the fits of the higher temperature data (Fig. 6D). In addition, FitSpace confidence contour analysis (64) showed that the derived k off value for the complex dissociation step was not constrained by the data (not shown), suggesting that the model is more complex. Closer examination of the high temperature data showed that several reactions failed to go to completion, suggesting the enzyme rapidly inactivates at higher temperatures (Fig. 7). We therefore revised  (Table S2). These values were obtained as described in the Experimental procedures section. G, c(s) distributions from five MWRAD 2 concentrations at 37 C normalized by total integrated area (note: each distribution corresponds to the black line from the respective concentration panel in A-E). The position of holo-MWRAD 2 at 7.2 S is indicated with the arrow. MLL, mixed lineage leukemia; MWRAD 2, MLL core complex with WDR5, RbBP5, Ash2L, and two copies of DPY-30; SV-AUC, sedimentation velocity analytical ultracentrifugation.
the working model to incorporate an irreversible enzyme inactivation step (k inact ) (Fig. 6, model 3). The resulting simulations resulted in good fits to both the low and high temperature datasets shown in Figure 6, E and F, respectively. In addition, FitSpace analysis showed that the derived pseudofirst-order rate constants for monomethylation (k me1 ) and dimethylation (k me2 ) reactions were reasonably well constrained by the data (Fig. 6, G and H). Furthermore, the rate of enzyme inactivation (k inact ) was constrained by the data in the higher temperature experiments (Fig. 6H) but not in the lower temperature experiments (Fig. 6G), where enzyme inactivation is negligible. Figure 7 shows that the use of model 3 produces good fits for all datasets.
Based on these results, we then used the fits to model 3 to compare the impacts of temperature and concentration on the enzymatic activity of the MLL1 core complex (Fig. 8). The obtained pseudo-first-order rate constants for monomethylation (k me1 ), dimethylation (k me2 ), and the rate of enzyme inactivation (k inact ) are summarized in Tables S3-S5, respectively. At most of the tested enzyme concentrations, activity increased linearly as the temperature increased from 5 to 20 C (Fig. 8, A and C). However, above 20 C, non-Arrhenius behavior was observed, as the rate of irreversible enzyme inactivation (k inact ) rivaled or exceeded the rates of turnover (Tables S3-S5), resulting in reactions that failed to go to completion (Fig. 7). These results are consistent with the conclusions from the SV-AUC analysis, which suggested that as the complex dissociates at higher temperatures, one or more of the subunits undergoes an irreversible conformational change that is not competent for catalysis. We therefore plotted k me1 and k me2 rates (Ln(k n) ) as a function of temperature (1/T) between 5 C and 20 C to fit the data to the Arrhenius equation (Fig. 8, B and D, respectively). Linear fitting of the Arrhenius plots revealed similar values for the energy of activation (E a ) between the tested concentrations. The average E a values were 10.9 ± 2.0 kcal K −1 mol −1 and 17.8 ± 4.7 kcal K −1 mol −1 for the monomethylation and dimethylation reactions, respectively.
The minimum enzyme concentration resulting in complete conversion into the monomethylated and then dimethylated forms was 1.0 μM at 15 C (Fig. 7). Slightly higher activity was observed at the same enzyme concentration at 20 C but with evidence of significant enzyme inactivation resulting in failure to go to completion. Increased concentration extended the range of temperatures under which complete conversion could be observed. For example, at 5 μM enzyme concentration, complete conversion of the peptide into the dimethylated form was observed between 5 C and 30 C, with evidence of Figure 5. Thermodynamic characterization of MLL1 core complex assembly. A, signal-weighted (s w ) isotherms of MWRAD 2 were obtained for each temperature, plotted against loading concentration, and fit to an A + B ⇋ AB heteroassociation model using SEDPHAT. The lines represent the fits for each isotherm, which were conducted at 5 C (blue), 10 C (purple), 15 C (cyan), 20 C (green), 25 C (gray), 30 C (orange), and 37 C (red). K d app values are summarized in Table 1. B, van't Hoff plot derived from the apparent K eq values. Linear regression was used to independently fit the data for the high temperature range (red, 25-37 C) and low temperature range (blue, 5-25 C). C, summary of thermodynamic parameters for MLL1 core complex assembly under high and low temperature regimes derived from the van't Hoff analysis in (B). MLL, mixed lineage leukemia; MWRAD 2, MWRAD 2 , MLL core complex with WDR5, RbBP5, Ash2L, and two copies of DPY-30.   methylation is regulated by MLL1 core complex assembly modest H3K4 trimethylation activity (7-15%) between 10 C and 25 C (Fig. 7). However, at 37 C, only 25% of the peptide was converted into the dimethylated form before the enzyme was completely inactivated.

Discussion
In this investigation, we systematically characterized the hydrodynamic and kinetic properties of a reconstituted human MLL1 core complex under a variety of assay conditions. We found that complex assembly is highly concentration and temperature dependent. Consistent with the hypothesized hierarchical assembly pathway, we found that the holocomplex assembles through interactions between the MW and RAD 2 subcomplexes and that this assembly is correlated with enzymatic activity. However, unexpectedly, we also found that the disassembled state of the complex is favored at physiological temperatures and at the submicromolar enzyme concentrations typically used in steady-state enzymatic assays (in which the substrate is in vast excess compared with the concentration of enzyme). We found that the complex disassembly results in rapid and irreversible enzyme inactivation under these conditions, likely because one or more individual subunits samples unproductive conformational states.
These results have immediate implications for the study of this and other multisubunit enzyme complexes as MWRAD 2 activity assays and inhibition studies have been performed under a variety of temperatures and concentrations across laboratories. Based on our results, we recommend assays be conducted with at least 1 μM MLL1 core complex and at 15 C when working in vitro. Concentrations below this threshold fail to go to completion, as do reactions above 15 C (Fig. 7). Variation from these experimental conditions likely underlies variation in IC 50 values for inhibitors targeting the WDR5 interaction motif -WDR5 interaction in different laboratories, making it difficult to identify the best inhibitors. For example, we previously found that an approximately fourfold change in [MWRAD 2 ] resulted in an approximately eightfold change in the IC 50 values obtained for inhibition of complex activity (65). It is therefore important to establish that the full complex is assembled and stable at the desired assay concentrations and temperatures over the duration of the experiment. It is most likely differences in MWRAD 2 assembly state that underlies lab-to-lab variability in observed methylation rates for the SET1 family core complexes.
Verification of complex formation is also essential for obtaining reasonable k app values for H3K4 methylation because SET1 family SET domains are slow monomethyltransferases in the absence of WDR5, RbBP5, and Ash2L (36). The presence of a Figure 7. Temperature and concentration dependence of MLL1 core complex enzymatic activity. Time courses for reactions at the indicated MWRAD 2 concentrations and temperatures were plotted and fit using model 3. Each time point represents the average from two independent experiments. Concentrations of each peptide species were plotted in red for H3K4me0, green for H3K4me1, and blue for H3K4me2. For reactions showing small amounts of H3K4me3 (yellow), model 3 was modified to incorporate an additional turnover step followed by product release. Note: Figure 6, E and F are reused in this figure, as they are the 5 μM MWRAD 2 at 5 C (bottom left panel) and 1 μM MWRAD 2 at 25 C (fourth row; fifth column) reaction time courses, respectively, fit with model 3. H3K4, histone H3 lysine 4; MLL, mixed lineage leukemia; MWRAD 2, MLL core complex with WDR5, RbBP5, Ash2L, and two copies of DPY-30.
confidence contour analysis for the reaction catalyzed by 5 μM MWRAD 2 at 5 C fit with model 3. k inact is not constrained by the data, mainly because of the absence of detectable enzyme inactivation during the reaction time course at 5 C. H, FitSpace confidence contour analysis of the fit of model 3 to the reaction catalyzed by 1 μM MWRAD 2 at 25 C. k inact is now constrained by the data. MWRAD 2 , MLL core complex with WDR5, RbBP5, Ash2L, and two copies of DPY-30.
disassembled species in activity assays could result in kinetic constants being underestimated, perhaps significantly, depending on the percentage of complex in this state. In addition, several SET1 family members (MLL1, MLL4, Setd1a, and Setd1b) gain the ability to multiply methylate H3K4 in the presence of WRAD 2 (36,66). Without prior knowledge of complex formation and stability at assay concentrations and temperature, it would be possible to lose multiple methylation of H3K4 for the SET1 complex under investigation. This effect, coupled with techniques that do not distinguish between lysine methylation states, such as 3 H-mediated fluorography and fluorescence-coupled reactions could easily result in the omission of multiple methylation rates, or with them being accidentally grouped into the monomethylation rate estimate. This issue can be alleviated using previously methylated substrate peptides and by keeping observations within the linear range of a reaction but ideally, observing lysine methylation kinetics by lysine methyltransferases should be done with a method that distinguishes each methyl state, as well as quantifying the amounts of each species present in the reaction at each timepoint. Both quantitative and label-free MALDI-TOF MS (36,66) and quantitative and continuous detection, 13 C-methyl incorporation NMR (67) have proven to be valuable methods for lysine multiple methylation studies such as these.
While the concerns these results highlight for in vitro characterization of the MLL1 core complex activity can be addressed in the ways previously stated, they also lead directly to a question about its in vivo nature: how does this complex form at physiological temperature? Our studies of MWRAD 2 at 37 C show a strong tendency toward disassembly, one that could potentially be overcome by increasing the concentration of the complex components to many tens of micromolar, as suggested by the S W isotherm results (Fig. 5A). This suggests a model in which the basal state of the complex is disassembled into MW and RAD 2 subcomplexes. Given that RbBP5 and Ash2L can interact with nucleosomes on their own (data not shown), and their high concentration in cells compared with MLL1 (68), it seems likely that RAD 2 interacts with nucleosomes first. Subsequently, MW, which exists at the C-terminal end of a long and flexible, intrinsically disordered region of the MLL1 primary sequence (Fig. S2) (69), swings in and binds to complete core complex assembly right on the nucleosome face when concentrations have reached the critical threshold (Fig. 9). This "swinging domain" mechanism would allow for very precise control of H3K4 methylation throughout the chromatin landscape and may explain why WRAD 2 subunits exist in a vast stoichiometric excess relative to MLL1 (68). It is possible that additional factors are also required for a more stable complex formation, such as post-translational modifications of complex components (70). The 3700 missing residues of MLL1 in the construct studied here and/or the presence of additional subunits could also be responsible for conferring a more stable complex architecture in the cell. However, in support of a "mass action" mechanism, we have shown that the MLL1 core complex can be induced to undergo liquid-liquid phase separation in vitro (in preparation), where high local concentrations of complex subunits overcome the entropic barrier for complex assembly. Consistent with this mechanism, it has previously been demonstrated that MLL1 has a punctate distribution within the nucleus (71) and that it colocalizes with RNA polymerase II in transcription factories (72). Entry of the MLL1 core complex into dense regions such as this may allow sufficient activity against the chromatin substrate, providing a potential mechanism for spatial and temporal control of H3K4 methylation.

Protein expression and purification
Each of the human genes for the MLL1 SET domain (amino acids 3745-3969; UniProt ID: Q03164), WDR5 (amino acids 2-334; UniProt ID: P61964), RbBP5 (amino acids 1-538; UniProt ID: Q15291), and Ash2L (amino acids 1-534; UniProt ID: Q9UBL3-3) (73) were cloned into the pST44 polycistronic vector (74). The WDR5 subunit was cloned with an N-terminal 6x-Histidine tag followed by a tobacco etch virus protease cleavage site. Plasmids were transformed into Rosetta pLysS BL21 Escherichia coli cells and plated on LB agar supplemented with 50 μg/ml carbenicillin and 20 μg/ml chloramphenicol (both from Gold Biotechnology). Individual colonies were used to inoculate a seed culture of 50 ml of Terrific Broth II (MP Biomedicals), again supplemented with carbenicillin and chloramphenicol and grown overnight at 30 C. About 20 ml of the seed culture was used to inoculate 1 l of Terrific Broth II media in baffled 2800 ml flasks, maintaining the antibiotic resistance. Cultures were then grown for 2 to 4 h at 37 C and 200 RPM shaking until the absorbance reached 1 at 600 nm. Cultures were then chilled for 1 h at 4 C followed by induction with 1 mM IPTG (Gold Biotechnology), after which cells were grown for an additional 20 to 22 h at 16 C with constant shaking. Cells were harvested by centrifugation at 4 C, and pellets were flash frozen in liquid nitrogen and stored at −80 C until they could be lysed. Frozen cells were thawed and resuspended in 50 ml of lysis buffer (50 mM Tris-HCl, pH 7.5; 300 mM NaCl; 30 mM imidazole; 3 mM DTT, and 1 μM ZnCl 2 , supplemented with one tablet of EDTA-free protease inhibitor cocktail [Roche]), lysed with a microfluidizer, and cleared by centrifugation at 17,000 RPM at 4 C for 30 min. The supernatant was diluted to 250 ml in buffer 1 (50 mM Tris-HCl, pH 7.5; 300 mM NaCl; 30 mM imidazole; 3 mM DTT, and 1 μM ZnCl 2 ) and flowed over a HisTrap 5 ml nickel affinity column (GE) using an AKTA Purifier FPLC (GE) at a rate of 0.5 ml/min. Bound complex was washed with 10 column volumes of buffer 1 at 1 ml/min and then eluted with a 25-column volume linear gradient of buffer 2 (buffer 1 with 500 mM imidazole). Fractions containing the MWRA complex were pooled, supplemented with glutathione-S-transferase-6x-His-tobacco etch virus protease to a final concentration of 0.1 mg/ml and dialyzed against buffer 1 with three changes. The complex was then passed over a re-equilibrated HisTrap column, and fractions from the flow-through containing the cleaved MWRA sample were collected, concentrated by ultrafiltration using a 30 kDa cutoff membrane to 15 ml, and further purified by multiple rounds of SEC using a Superdex 200 (16/60) column (GE) preequilibrated with buffer 3 (20 mM Tris-HCl, pH 7.5; 300 mM NaCl; 1 mM Tris(2-carboxyethyl)phosphine; and 1 μM ZnCl 2 ), with 5 ml sequential injections. The resulting fractions of pure MWRA were concentrated to <5 ml and a twofold molar excess of human DPY-30 (amino acids 1-99; Figure 9. Model of MWRAD 2 assembly on the nucleosome core particle (NCP). MW can bind to RAD 2 both in solution and prebound to NCP. The RAD 2 subcomplex binds to nucleosome in the absence of MW, primarily through nonspecific DNA contacts. Figure made using Protein Data Bank ID: 7UD5 (79). MWRAD 2, MLL core complex with WDR5, RbBP5, Ash2L, and two copies of DPY-30.
UniProt ID: Q9C005), expressed and purified as previously described (36), was added to the sample. The resultant complex was purified with multiple rounds of SEC in buffer 3. Fractions containing purified MWRAD 2 were concentrated to 12 mg/ml, aliquoted, flash frozen, and stored at −80 C until use. Individual subunits for Bayesian experiments were purified as previously described (36).

SV-AUC
Experimental set-up All stock protein samples were thawed on ice, diluted to the desired concentration, and spun at 15,000 RPM for 15 min at 4 C using a Thermo Scientific tabletop refrigerated centrifuge to remove any debris. Protein concentrations were measured with a NanoDrop spectrophotometer using the extinction coefficient ε 280 of 248,954 M −1 cm −1 , which was predicted from the amino acid sequence using ProtParam (Expasy.org) (75). About 100 or 400 μl of diluted protein samples were then loaded into AUC cells containing 3 mm or 12 mm two-sector charcoal-Epon centerpieces (Spin Analytical) assembled with quartz or sapphire windows. Matching buffer was loaded into the reference sector of each cell. AUC cells were then loaded into a Ti-60 4-hole Beckman-Coulter rotor that was preequilibrated to the specific run temperature for at least 4 h. Rotors were then inserted into the chamber of the centrifuge and allowed to re-equilibrate to experimental temperature for a minimum of 2 h before initiation of the run. SV-AUC was performed using a Beckman-Coulter Proteomelab XL-A analytical ultracentrifuge equipped with absorbance optics. Each run was preceded by a 3000 rpm wavelength scan to detect cell leakage and to select the appropriate wavelength to ensure a starting absorbance of between 0.25 and 1.2 absorbance units. Wavelengths at or near the maximal absorbance for aromatics of 280 nm or peptide backbone of 230 nm were selected, depending on the protein concentration and pathlength of the centerpiece. Without slowing the rotor, a method scan of 50,000 rpm was initiated, and 200 scans/cell were collected with the time interval between scans set to zero. Each experiment was replicated in duplicate or triplicate.

Data analysis
Lamm equation modeling of all SV-AUC results was performed using the continuous distribution (c(s)) method in SEDFIT (56). Maximum entropy (ME) regularization using a confidence level of p = 0.68 was performed to identify the most parsimonious distribution consistent with the data, and the fits for each experiment gave acceptable RMSD values ranging between 0.003 and 0.01. Density, viscosity, and partial specific volume values were estimated by inputting the temperature, buffer reagents, and amino acid sequences of all five complex components (assuming a DPY-30 dimer) into the SEDNTERP (https://bitc.sr.unh.edu/index.php/Main_Page) program (76), and the values used are listed in Table S6. The resulting c(s) distributions were displayed and further analyzed using GUSSI (https://www.utsouthwestern.edu/research/corefacilities/mbr/software/) (77). To determine the amount of holocomplex under each condition, distributions were integrated between S values 6.8 and 7.6, which represents one standard deviation from the mean S value of the holocomplex peak over all conditions, which was 7.2 ± 0.4. For binding analyses, c(s) distributions were integrated from 0.5 to 9.5 S to derive the corresponding signal-weighted average sedimentation coefficients (s w ), which were plotted as a function of loading concentration at each temperature and fit with mass action law models using the program SEDPHAT (78).
For Bayesian analyses of c(s) distributions, expected sedimentation coefficients were derived from separate SV-AUC experiments of individual subunits or assembled subcomplexes, which were each run at concentrations ranging from 0.25 to 5 μM at 25 C (the data for 0.25 μM runs are shown in Fig. S1). These values were then used in ME regularization as prior expectation restraints to give c (p) (s) distributions of the holocomplex at 25 C. Prior expectations for subcomplexes or individual subunits were implemented as Gaussians in SEDFIT for Bayesian analysis, with a peak width of sigma = 0.2 S and centered at the weight-average S value of the main peak observed in the individual experiments with an amplitude of 0.05 absorbance units. Since the prior expected S values for WDR5 or RbBP5 overlapped when run in individual experiments, they were used as prior expectations in c (p) (s) distributions to test the concerted assembly mechanism with the same weight-average S value but with an amplitude that was doubled (Fig. 3D). Each c (p) (s) distribution was fit with the same prior expectation for MWRAD 2 , which used the weight-average S value determined at 25 C and 0.25 μM with a width of sigma = 0.4 S and an amplitude of 0.3 absorbance units.
Methyltransferase activity assay MWRAD 2 complex was assayed using a label-free quantitative MALDI-TOF mass spectrometry assay (36). Each 20 μl reaction consisted of varying concentrations of MWRAD 2 , 250 μM AdoMet, and reaction buffer (50 mM Tris, pH 9.0; 200 mM NaCl; 5% (v/v) glycerol; 1 μM ZnCl 2 ; and 3 mM DTT), which were preincubated for 5 min at the experimental temperature in a thermocycler. Reactions were initiated by the addition of temperature-pre-equilibrated histone H3 peptide (residues 1-20, with an additional C-terminal GGK-biotin moiety) to a final concentration of 10 μM. At various time points, a 2 μl aliquot was removed and quenched by mixing with 2 μl of 1% TFA. Quenched reactions were stored at −20 C until they could be analyzed. Samples were thawed, and 1 μl of each was mixed with 4 μl of α-cyano-4-hydroxycinnamic acid in 0.5% TFA and 50% acetonitrile. About 2 μl of this mixture for each time point was spotted onto a ground steel target plate and allowed to dry at room temperature for 3 to 12 h. Spectra were acquired on a Bruker Autoflex III MALDI-TOF mass spectrometer in reflectron mode. Each spectrum was the sum of at least 1000 individual laser shots, obtained from five different positions around the spot, with 200 shots at each position. Using FlexAnalysis software (Bruker), the intensities of the unmodified (m/z 2651 Da), monomethylated (m/z 2665 Da), dimethylated (m/z 2679 Da), and trimethylated (m/z 2693 Da) species were summed to obtain the total intensity. The relative amount of each species was then determined by dividing the intensity of each methylation state by the total intensity at each time point and multiplied by the starting substrate concentration (10 μM) to give the micromolar concentration of each methylation state. These data were then plotted as a function of time for kinetics analyses.
Fitting of the data was performed using the numerical integration of rate equations approach implemented in KinTek Explorer software, version 6.3 (61). For reaction models incorporating the complex dissociation step, the ratio (k off /k on ) was constrained to be equal to estimated K d app for complex dissociation at each temperature determined from the sedimentation velocity s w isotherm analysis, with the k on fixed at the limit of diffusion. All other nonvariable parameters were fixed with non-rate-limiting values. Confidence contour analysis using a Chi 2 threshold of 0.9 was used to obtain estimates for the extent to which each variable parameter was constrained by the data.

Statistical methods
Pearson's linear correlation coefficient was used to assess relationships between H3K4 methylation rates and the biophysical parameters. The significance of the Pearson correlation coefficient was evaluated using a t test in XLStat (Addinsoft) software.

Data availability
The raw SV-AUC data and MALDI-TOF time-course methyltransferase data used for analysis are available upon reasonable request.
Supporting information-This article contains supporting information (69).